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SAADAT, Administrative Patent Judge. 

DECISION ON APPEAL 

STATEMENT OF THE CASE 
Appellants appeal under 35 U.S.C. § 134(a) from a final rejection of 
claims 1 and 3-24, which are all of the claims pending in this application as 
claim 2 has been canceled. We have jurisdiction under 35 U.S.C. § 6(b). 
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Appellants invented a method and system for combining language 
model scores, for each of the most likely words in a list, generated by a 
language model mixture in an automatic speech recognition system with a 
reduced word error rate (Spec. 6-7). According to Appellants, a set of 
coefficients is used to combine the language model scores by dividing text 
data for training a plurality of sets of coefficients into partitions, and later 
selecting the set of coefficients for each of the most likely words in the list 
(Spec. 7). 

Claims 1 and 11, which are representative of the claims on appeal, 

read as follows: 

1 . In an Automatic Speech Recognition (ASR) system having 
at least two language models, a method for combining language 
model scores generated by at least two language models, said method 
comprising the steps of: 

generating a list of most likely words for a current word in a 
word sequence uttered by a speaker, and acoustic scores 
corresponding to the most likely words; 

computing language model scores for each of the most likely 
words in the list, for each of the at least two language models; 

respectively and dynamically determining a set of coefficients 
to be used to combine the language model scores of each of the most 
likely words in the list, based on a context of the current word; and 



respectively combining the language model scores of each of 
the most likely words in the list to obtain a composite score for each 
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of the most likely words in the list, using the set of coefficients 
determined therefor; 



wherein said determining step comprises the steps of: 

dividing text data for training a plurality of sets of 
coefficients into partitions, depending on word counts 
corresponding to each of the at least two language models; and 

for each of the most likely words in the list, dynamically 
selecting the set of coefficients from among the plurality of sets 
of coefficients so as to maximize the likelihood of the text data 
with respect to the at least two language models. 

11. A method for combining language model scores 
generated by at least two language models comprised in an Automatic 
Speech Recognition (ASR) system, said method comprising the steps 
of: 

generating a list of most likely words for a current word in a 
word sequence uttered by a speaker, and acoustic scores 
corresponding to the most likely words; 

computing language model scores for each of the most likely 
words in the list, for each of the at least two language models; 

respectively and dynamically determining a weight vector to be 
used to combine the language model scores of each of the most likely 
words in the list based on a context of the current word, the weight 
vector comprising n- weights, wherein n equals a number of language 
models in the system, and each of the n-weights depends upon history 
n-gram counts; and 
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respectively combining the language model scores of each of 
the most likely words in the list to obtain a composite score for each 
of the most likely words in the list, using the weight vector 
determined therefor. 

The prior art references relied upon by the Examiner in rejecting the 
claims on appeal are: 

Goldenthal US 5,625,749 Apr. 29, 1997 

Gillick US 6,167,377 Dec. 26, 2000 

Claims 1, 3, 5-13, and 15-24 stand rejected under 35 U.S.C. § 102(e) 
as anticipated by Gillick. 

Claims 4 and 14 stand rejected under 35 U.S.C. § 103(a) as being 
unpatentable over Gillick and Goldenthal. 

Rather than repeat the arguments here, we make reference to the 
Briefs and the Answer for the respective positions of the Appellants and the 
Examiner. 

We reverse. 

ISSUES 

1 . Under 35 U.S.C § 102(e), with respect to appealed claims 1, 3, 
5-13, and 15-24, does Gillick anticipate the claimed subject matter by 
teaching all of the claimed limitations? 

2. Under 35 U.S.C § 103(a), with respect to appealed claims 4 and 
14, would one of ordinary skill in the art at the time of the invention have 
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found it obvious to combine Gillick with Goldenthal to render the claimed 
invention unpatentable? 

PRINCIPLES OF LAW 

1. Anticipation 

A rejection for anticipation requires that the four corners of a single 
prior art document describe every element of the claimed invention, either 
expressly or inherently, such that a person of ordinary skill in the art could 
practice the invention without undue experimentation. See Atlas Powder 
Co. v. IRECO, Inc., 190 F.3d 1342, 1347 (Fed. Cir. 1999); In re Paulsen, 30 
F.3d 1475, 1478-79 (Fed. Cir. 1994). 

2. Obviousness 

The test for obviousness is what the combined teachings of the 
references would have suggested to one of ordinary skill in the art. See In re 
Kahn, 441 F.3d 977, 987-88 (Fed. Cir. 2006), In re Young, 927 F.2d 588, 
591 (Fed. Cir. 1991), and/« re Keller, 642 F.2d413, 425 (CCPA 1981). 

The Examiner can satisfy this burden by showing some articulated 
reasoning with some rational underpinning to support the legal conclusion of 
obviousness. KSRInt'l. v. Teleflex Inc. , 127 S. Ct. 1727, 1741 (2007) 
(citing In re Kahn, 441 F.3d 977, 988 (Fed. Cir. 2006)). 

ANALYSIS 

1. 35 U.S.C. § 102(e) rejection of claims 1, 3, 5-13, and 15-24 
5 
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Appellants contend that the portions of Gillick which the Examiner 
has relied on for disclosing the recited features of the claims are merely 
pieced together without a logical thread among the cited teachings (App. Br. 
6-7; Reply Br. 2-4). Appellants specifically argue that cited portions of 
Gillick do not teach "determining a set of coefficients to be used to combine 
the language model scores, based on a context of the current word" as 
claimed in claim 1 (App. Br. 5). Appellants further argue that the cited 
portions in column 17 of Gillick merely refer to the frequency with which a 
word occurs in the context of a preceding word (id.). The Examiner 
responds by referring to column 16, lines 20-40 and 50-59, of Gillick and 
stating that the word context is taught since the disclosed equations teach 
updating the technique based on previous determinations (Ans. 9-10). 

We agree with Appellants and find the Examiner's characterization of 
the coefficient equations in Gillick based on a context of the current word to 
be speculative and without factual support. Gillick determines a combined 
score from the scores produced by the language models based on 
interpolation weights (col. 16, 11. 8-11). However, Gillick does not specify 
that the determination of the coefficients is based on a context of the current 
word. As argued by Appellants (App. Br. 5), in the discussion of context in 
relation with the language models, Gillick refers to "the context of a 
preceding word" (col. 17, 1. 41) and "the context of a preceding category" 
(col. 17, 11. 51-52), which are different from the claimed language "a context 
of the current word." 
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Appellants further argue that the portion of Gillick in column 15, 
which was relied on by the Examiner for allegedly teaching "dividing text 
data for training a plurality of sets of coefficients," has no logical relation 
with determining a set of coefficients (App. Br. 6-7; Reply Br. 3-4). 
Appellants further contend that Gillick "describes a function of a 
control/interface module that collects acoustic information from a user and 
trains a user's models based on the acoustic information" (Reply Br. 3). 
Appellants argue that this description of training in Gillick does not include 
anything regarding the training of a plurality of sets of interpolation weights, 
even if the interpolation weights of Gillick may be taken to be the same as 
the claimed set of coefficients that are used to combine the language model 
scores (id.). 

Based on a review of the cited portions of Gillick, we find ourselves 
persuaded by Appellants' arguments that the Examiner has not pointed to 
any teachings in Gillick for determining the coefficients by dividing text 
data for training a plurality of sets of coefficients into partitions, as recited in 
claim 1 . The claimed dividing of the text data into partitions is described in 
Appellants' Specification to involve determining the words on the history 
positions by using the last two words in a trigram to predict the current word 
(Spec. 22:20 - 23:3) and based on the count or frequency of the word pair in 
each language model (Spec. 24:9-15). This process is different from the 
training of the user's model in Gillick. We also disagree with the 
Examiner's reliance on column 16, lines 44-48, of Gillick to conclude (Ans. 
10-11) that taking a part of k words to recognize the best candidate is the 



7 



Appeal 2007-4297 
Application 09/782,434 



same as "dividing text data for training a plurality of sets of coefficients into 
partitions." Therefore, as argued by Appellants, we find that the coefficients 
in Gillick are not trained, specifically, by dividing text data into partitions. 

Therefore, considering the teachings of Gillick and the Examiner's 
line of reasoning based on unrelated teachings in Gillick, we find that the 
Examiner's conclusory statements are not sufficient for satisfying the initial 
burden of presenting a prima facie case of anticipation with respect to claim 
1. Therefore we do not sustain the 35 U.S.C. § 102(e) rejection of 
independent claim 1, nor of claims 3 and 5-10 dependent thereon. 

With respect to claims 1 1 and 19, Appellants argue that Gillick does 
not teach "determining a weight vector to be used to combine the language 
model scores of each of the most likely words in the list based on a context 
of the current word" (App. Br. 7). For substantially the same reasons 
discussed above, we agree with Appellants that the Examiner has not 
pointed to any teaching in Gillick that the language model scores are 
combined based on a context of the current word. 

Appellants further question the Examiner's characterization of the 
claimed "history n-gram counts" as the counts of a given n-gram in Gillick 
(App. Br. 9). In support of their arguments, Appellants point to the 
Specification (Spec. 20:10-22) wherein a "history n-gram" is described as 
the history (i.e., the previous words) of the current word being determined 
(id.). We find that the portions of Gillick relied on by the Examiner (Ans. 7) 
relate to a language model score that represents the frequency with which 
the pair of the words includes the list word (col. 14, 11. 26-32). Therefore, 
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we agree with Appellants that this description of Gillick is different from the 
claimed "history n-gram" as described in Appellants' Specification. In view 
of the analysis of Gillick and the absence of the Examiner's response to 
Appellants' rebuttal, we do not sustain the 35 U.S.C. § 102(e) rejection of 
independent claims 11 and 19, nor of claims 12, 13, 15-18, and 20-24 
dependent thereon, over Gillick. 

2. 35 U.S.C. § 103(a) rejection of claims 4 and 14 
In rejecting claims 4 and 14, the Examiner relies on Goldenthal in 
addition to Gillick. However, the Examiner did not identify any teaching in 
this reference that can overcome the deficiencies of Gillick discussed above. 
Therefore, we do not sustain the 35 U.S.C. § 103(a) rejection of claims 4 and 
14 over Gillick and Goldenthal. 



ORDER 

The decision of the Examiner rejecting claims 1 and 3-24 is reversed. 



REVERSED 
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